Search CORE

20 research outputs found

Grounding event references in news

Author: Nothman Joel
Publication venue: Faculty of Engineering and Information Technologies, School of Information Technologies
Publication date: 01/01/2014
Field of study

Events are frequently discussed in natural language, and their accurate identification is central to language understanding. Yet they are diverse and complex in ontology and reference; computational processing hence proves challenging. News provides a shared basis for communication by reporting events. We perform several studies into news event reference. One annotation study characterises each news report in terms of its update and topic events, but finds that topic is better consider through explicit references to background events. In this context, we propose the event linking task which—analogous to named entity linking or disambiguation—models the grounding of references to notable events. It defines the disambiguation of an event reference as a link to the archival article that first reports it. When two references are linked to the same article, they need not be references to the same event. Event linking hopes to provide an intuitive approximation to coreference, erring on the side of over-generation in contrast with the literature. The task is also distinguished in considering event references from multiple perspectives over time. We diagnostically evaluate the task by first linking references to past, newsworthy events in news and opinion pieces to an archive of the Sydney Morning Herald. The intensive annotation results in only a small corpus of 229 distinct links. However, we observe that a number of hyperlinks targeting online news correspond to event links. We thus acquire two large corpora of hyperlinks at very low cost. From these we learn weights for temporal and term overlap features in a retrieval system. These noisy data lead to significant performance gains over a bag-of-words baseline. While our initial system can accurately predict many event links, most will require deep linguistic processing for their disambiguation

Sydney eScholarship

A Joint Model for Entity Analysis: Coreference, Typing, and Linking

Author: Duchi John
Hachey Ben
Nothman Joel
Publication venue: 'MIT Press - Journals'
Publication date
Field of study

Crossref

Analysing wikipedia and gold-standard corpora for ner training

Author: James R. Curran
Joel Nothman
Tara Murphy
Publication venue
Publication date: 01/01/2009
Field of study

Named entity recognition (NER) for English typically involves one of three gold standards: MUC, CoNLL, or BBN, all created by costly manual annotation. Recent work has used Wikipedia to automatically create a massive corpus of named entity annotated text. We present the first comprehensive crosscorpus evaluation of NER. We identify the causes of poor cross-corpus performance and demonstrate ways of making them more compatible. Using our process, we develop a Wikipedia corpus which outperforms gold standard corpora on crosscorpus evaluation by up to 11%.

CiteSeerX

Crossref

Event linking : grounding event reference in a news archive

Author: Curran James R
Hachey Ben
Honnibal Matthew
Nothman Joel
Publication venue: Stroudsburg, PA : Association for Computational Linguistics
Publication date: 01/01/2012
Field of study

Interpreting news requires identifying its constituent events. Events are complex linguistically and ontologically, so disambiguating their reference is challenging. We introduce event linking, which canonically labels an event reference with the article where it was first reported. This implicitly relaxes coreference to co-reporting, and will practically enable augmenting news archives with semantic hyperlinks. We annotate and analyse a corpus of 150 documents, extracting 501 links to a news archive with reasonable inter-annotator agreement.5 page(s

Macquarie University ResearchOnline

Documentlevel entity linking: Cmcrc at tac 2010

Author: Ben Hachey
James R. Curran
Joel Nothman
Matthew Honnibal
Will Radford
Publication venue
Publication date: 01/01/2010
Field of study

This paper describes the CMCRC systems entered in the TAC 2010 entity linking challenge. The best performing system we describe implements the document-level entity linking system from Cucerzan (2007), with several additions that exploit global information. Our implementation of Cucerzan’s method achieved a score of 74.9 % in development experiments. Additional global information improves performance to 78.4%. On the TAC 2010 test data, our best system achieves a score of 84.4%, which is second in the overall rankings of submitted systems.

CiteSeerX

Macquarie University ResearchOnline

Naïve but effective NIL clustering baselines -CMCRC at TAC 2011

Author: Ben Hachey
James R Curran
Joel Nothman
Matthew Honnibal
Will Radford
Publication venue
Publication date: 24/04/2020
Field of study

Abstract This paper describes the CMCRC systems entered in the TAC 2011 entity linking challenge. We used our best-performing system from TAC 2010 to link queries, then clustered NIL links. We focused on naïve baselines that group by attributes of the top entity candidate. All three systems performed strongly at 75.4% B 3 F1, above the 71.6% median score

CiteSeerX